4 research outputs found

    Determination of Formant Features in Czech and Slovak for GMM Emotional Speech Classifier

    Get PDF
    The paper is aimed at determination of formant features (FF) which describe vocal tract characteristics. It comprises analysis of the first three formant positions together with their bandwidths and the formant tilts. Subsequently, the statistical evaluation and comparison of the FF was performed. This experiment was realized with the speech material in the form of sentences of male and female speakers expressing four emotional states (joy, sadness, anger, and a neutral state) in Czech and Slovak languages. The statistical distribution of the analyzed formant frequencies and formant tilts shows good differentiation between neutral and emotional styles for both voices. Contrary to it, the values of the formant 3-dB bandwidths have no correlation with the type of the speaking style or the type of the voice. These spectral parameters together with the values of the other speech characteristics were used in the feature vector for Gaussian mixture models (GMM) emotional speech style classifier that is currently developed. The overall mean classification error rate achieves about 18 %, and the best obtained error rate is 5 % for the sadness style of the female voice. These values are acceptable in this first stage of development of the GMM classifier that should be used for evaluation of the synthetic speech quality after applied voice conversion and emotional speech style transformation

    Preemphasis Influence on Harmonic Speech Model with Autoregressive Parameterization

    Get PDF
    Autoregressive speech parameterization with and without preemphasis is discussed for the source-filter model and the harmonic model. Quality of synthetic speech is compared for the harmonic speech model using autoregressive parameterization without preemphasis, with constant and adaptive preemphasis. Experimental results are evaluated by the RMS log spectral measure between the smoothed spectra of original and synthesized male, female, and childish speech sampled at 8 kHz and 16 kHz. Although the harmonic model is used, the benefit of the adaptive preemphasis could be valid for the source-filter model, as well

    Automatic Text-Independent Artifact Detection, Localization, and Classification in the Synthetic Speech

    No full text
    The paper describes experiments with statistical approaches to automatic detection, localization, and classification of the basic types of artifacts in the synthetic speech produced by the Czech text-to-speech system using the unit selection method. The first experiment is aimed at artifact detection by the analysis of variances (ANOVA) and hypothesis testing. The second experiment is focused on localization of the detected artifacts by the Gaussian mixture models (GMM). Finally, the developed open-set artifact classifier is described. The influence of the feature vector length and structure on the resulting artifact detection accuracy is analyzed together with other factors affecting the stability of the artifact detection process. Further investigations have shown a relatively great influence of the number of mixtures and the type of a covariance matrix on the artifact classification error rate as well as on the computational complexity. The obtained experimental results confirm the functionality of the artifact detector based on the ANOVA and hypothesis tests, and the GMM-based artifact localizer and classifier. The described statistical approaches represent the alternatives to the standard listening tests and the manual labeling of the artifacts

    Automatic Text-Independent Artifact Detection, Localization, and Classification in the Synthetic Speech

    Get PDF
    The paper describes experiments with statistical approaches to automatic detection, localization, and classification of the basic types of artifacts in the synthetic speech produced by the Czech text-to-speech system using the unit selection method. The first experiment is aimed at artifact detection by the analysis of variances (ANOVA) and hypothesis testing. The second experiment is focused on localization of the detected artifacts by the Gaussian mixture models (GMM). Finally, the developed open-set artifact classifier is described. The influence of the feature vector length and structure on the resulting artifact detection accuracy is analyzed together with other factors affecting the stability of the artifact detection process. Further investigations have shown a relatively great influence of the number of mixtures and the type of a covariance matrix on the artifact classification error rate as well as on the computational complexity. The obtained experimental results confirm the functionality of the artifact detector based on the ANOVA and hypothesis tests, and the GMM-based artifact localizer and classifier. The described statistical approaches represent the alternatives to the standard listening tests and the manual labeling of the artifacts
    corecore